Dct arithmetic device

ABSTRACT

There is provided a DCT processor for performing at least one of DCT operation and inverse DCT operation for image data in unit blocks having different sizes. This DCT processor is provided with a bit slice circuit ( 102 ) for outputting, bit by bit, the pixel data inputted for each column or row; a first butterfly operation circuit ( 103 ) for subjecting the output data of the bit slice circuit ( 102 ) to butterfly operation; a ROM address generation circuit ( 104 ) for generating continuous ROM addresses; an RAC ( 105 ) for reading the data corresponding to the ROM addresses from ROMs (ROM 0 ˜ROM 7 ) and accumulating the data by accumulation circuits ( 51   a   ˜51   h ); and a second butterfly operation circuit  106  for subjecting the output data of the RAC  105  to butterfly operation.

TECHNICAL FIELD

The present invention relates to a DCT processor which realizes discretecosine transform (hereinafter referred to as DCT) used for datacompression such as image signal processing and, more particularly, to aDCT processor which performs at least one of DCT operation and inverseDCT operation for image data in unit blocks having different sizes.

BACKGROUND ART

DCT is generally used for data compression of an image signal or thelike. In data compression for video, generally, data compressionutilizing intra-frame (spatial) correlation and data compressionutilizing inter-frame (temporal) correlation are performed, and DCTcorresponds to the former. DCT is a kind of frequency conversion method,that is, data compression is performed by removing high-frequencycomponents utilizing the characteristics of pixel values such thatrelatively large pixel values concentrate on low-frequency componentsafter conversion although pixel values disperse at random beforeconversion.

In DCT, initially, one image is divided into a plurality of unit blockseach having a predetermined shape and comprising a predetermined numberof pixels (e.g., 8×8), and DCT is performed on every unit block.Two-dimensional DCT is executed by performing one-dimensional DCT twice.For example, the result of one-dimensional DCT performed on a unit blockalong its column direction is subjected to one-dimensional DCT along itsrow direction.

Further, the image signal compressed by DCT is decompressed by inverseDCT.

Formulae (1) and (2) define two-dimensional DCT and two-dimensionalinverse ICT for an N×N unit block, respectively. $\begin{matrix}{{X\left( {u,v} \right)} = {{2/N} \cdot {C(u)} \cdot \quad {C\quad(v)} \cdot {\sum\limits_{i = 0}^{N - 1}\quad {\sum\limits_{j = 0}^{N - 1}\quad {{x\left( {i,j} \right)}\quad \cos \quad \left( {\left( {{2i} + 1} \right)u\quad {\pi/2}N} \right)\cos \quad \left( {\left( {{2j} + 1} \right)v\quad {\pi/2}N} \right)}}}}} & {{formula}\quad (1)} \\{{x\left( {i,j} \right)} = {{2/N} \cdot {\sum\limits_{u = 0}^{N - 1}\quad {\sum\limits_{v = 0}^{N - 1}\quad {{{\cdot {C(v)}} \cdot {X\left( {u,v} \right)}}\quad {\cos\left( \quad {\left( {{2i} + 1} \right)\quad u\quad {\pi/2}N} \right)}\quad {\cos\left( \quad {\left( {{2j} + 1} \right)v\quad {\pi/2}N} \right)}}}}}} & {{formula}\quad (2)}\end{matrix}$

Further, formula (3) defines one-dimensional DCT which is derived fromformulae (1) and (2). $\begin{matrix}{{X(u)} = {\sqrt{2/N} \cdot {C(u)} \cdot {\sum\limits_{i = 0}^{N - 1}\quad {{x(i)}{\cos \left( {\left( {{2i} + 1} \right)u\quad {\pi/2}N} \right)}}}}} & {{formula}\quad (3)}\end{matrix}$

In these formulae, x(i,j) (i,j=0,1,2, . . . ,N−1) indicates pixels, andX(u,v) (C(0)=1/{square root over ( )}2, C(u)=C(v)=1 (u,v=1,2, . . .,N−1)) indicates transform coefficients.

When N=8, the matrix operation of the one-dimensional DCT matrixaccording to formula (3) is represented by $\begin{matrix}{\text{**N=8**}\quad} & \quad \\{\begin{pmatrix}{X0} \\{X1} \\{X2} \\{X3} \\{X4} \\{X5} \\{X6} \\{X7}\end{pmatrix} = {\begin{pmatrix}0.353553 & 0.353553 & 0.353553 & 0.353553 & 0.353553 & 0.353553 & 0.353553 & 0.353553 \\0.490393 & 0.415735 & 0.277785 & 0.097545 & {- 0.097545} & {- 0.277785} & {- 0.415735} & {- 0.490393} \\0.461940 & 0.191342 & {- 0.191342} & {- 0.461940} & {- 0.461940} & {- 0.191342} & 0.191342 & 0.461940 \\0.415735 & {- 0.097545} & {- 0.490393} & {- 0.277785} & 0.277785 & 0.490393 & 0.097545 & {- 0.415735} \\0.353553 & {- 0.353553} & {- 0.353553} & 0.353553 & 0.353553 & {- 0.353553} & {- 0.353553} & 0.353553 \\0.277785 & {- 0.490393} & 0.097545 & 0.415735 & {- 0.415735} & {- 0.097545} & 0.490393 & {- 0.277785} \\0.191342 & {- 0.461940} & 0.461940 & {- 0.191342} & {- 0.191342} & 0.461940 & {- 0.461940} & 0.191342 \\0.097545 & {- 0.277785} & 0.415735 & {- 0.490393} & 0.490393 & {- 0.415735} & 0.277785 & {- 0.097545}\end{pmatrix}\begin{pmatrix}{x0} \\{x1} \\{x2} \\{x3} \\{x4} \\{x5} \\{x6} \\{x7}\end{pmatrix}}} & {{formula}\quad (4)}\end{matrix}$

When N=7, N=6, N=5, N=4, N=3, N=2, the matrix operations of theone-dimensional DCT are represented by $\begin{matrix}{\text{**N=7**}\quad} & \quad \\{\begin{pmatrix}{X0} \\{X1} \\{X2} \\{X3} \\{X4} \\{X5} \\{X6}\end{pmatrix} = {\begin{pmatrix}0.377964 & 0.377964 & 0.377964 & 0.377964 & 0.377964 & 0.377964 & 0.377964 \\0.521121 & 0.417907 & 0.231921 & 0.000000 & {- 0.231921} & {- 0.417907} & {- 0.521121} \\0.481588 & 0.118942 & {- 0.333269} & {- 0.534522} & {- 0.333269} & 0.118942 & 0.481588 \\0.417907 & {- 0.231921} & {- 0.521121} & {- 0.000000} & 0.521121 & 0.231921 & {- 0.417907} \\0.333269 & {- 0.481588} & {- 0.118942} & 0.534522 & {- 0.118942} & {- 0.481588} & 0.333269 \\0.231921 & {- 0.521121} & 0.417907 & 0.000000 & {- 0.417907} & 0.521121 & {- 0.231921} \\0.118942 & {- 0.333269} & 0.481588 & {- 0.534522} & 0.481588 & {- 0.333269} & 0.118942\end{pmatrix}\begin{pmatrix}{x0} \\{x1} \\{x2} \\{x3} \\{x4} \\{x5} \\{x6}\end{pmatrix}}} & {{formula}\quad (5)} \\{\text{**N=6**}\quad} & \quad \\{\begin{pmatrix}{X0} \\{X1} \\{X2} \\{X3} \\{X4} \\{X5}\end{pmatrix} = {\begin{pmatrix}0.408248 & 0.408248 & 0.408248 & 0.408248 & 0.408248 & 0.408248 \\0.557678 & 0.408248 & 0.149429 & {- 0.149429} & {- 0.408248} & {- 0.557678} \\0.500000 & 0.000000 & {- 0.500000} & {- 0.500000} & {- 0.000000} & 0.500000 \\0.408248 & {- 0.408248} & {- 0.408248} & 0.408248 & 0.408248 & {- 0.408248} \\0.288675 & {- 0.577350} & 0.288675 & 0.288675 & {- 0.577350} & 0.288675 \\0.149429 & {- 0.408248} & 0.557678 & {- 0.577678} & 0.408248 & {- 0.149429}\end{pmatrix}\begin{pmatrix}{x0} \\{x1} \\{x2} \\{x3} \\{x4} \\{x5}\end{pmatrix}}} & {{formula}\quad (6)} \\{\text{**N=5**}\quad} & \quad \\{\begin{pmatrix}{X0} \\{X1} \\{X2} \\{X3} \\{X4}\end{pmatrix} = {\begin{pmatrix}0.447214 & 0.447214 & 0.447214 & 0.447214 & 0.447214 \\0.601501 & 0.371748 & 0.000000 & {- 0.371748} & {- 0.601501} \\0.511667 & {- 0.195440} & {- 0.632456} & {- 0.195440} & 0.511667 \\0.371748 & {- 0.601501} & {- 0.000000} & 0.601501 & {- 0.371748} \\0.195440 & {- 0.511667} & 0.632456 & {- 0.511667} & 0.195440\end{pmatrix}\begin{pmatrix}{x0} \\{x1} \\{x2} \\{x3} \\{x4}\end{pmatrix}}} & {{formula}\quad (7)} \\{\text{**N=4**}\quad} & \quad \\{\begin{pmatrix}{X0} \\{X1} \\{X2} \\{X3}\end{pmatrix} = {\begin{pmatrix}0.500000 & 0.500000 & 0.500000 & 0.500000 \\0.635281 & 0.270598 & {- 0.270598} & {- 0.635281} \\0.500000 & {- 0.500000} & {- 0.500000} & 0.500000 \\0.270598 & {- 0.635281} & 0.635281 & {- 0.270598}\end{pmatrix}\begin{pmatrix}{x0} \\{x1} \\{x2} \\{x3}\end{pmatrix}}} & {{formula}\quad (8)} \\{\text{**N=3**}\quad} & \quad \\{\begin{pmatrix}{X0} \\{X1} \\{X2}\end{pmatrix} = {\begin{pmatrix}0.577350 & 0.707107 & 0.408248 \\0.577350 & 0.000000 & {- 0.816497} \\0.577350 & {- 0.707107} & 0.408248\end{pmatrix}\begin{pmatrix}{x0} \\{x1} \\{x2}\end{pmatrix}}} & {{formula}\quad (9)} \\{\text{**N=2**}\quad} & \quad \\{\begin{pmatrix}{X0} \\{X1}\end{pmatrix} = {\begin{pmatrix}0.707107 & 0.707107 \\0.707107 & {- 0.707107}\end{pmatrix}\begin{pmatrix}{x0} \\{x1}\end{pmatrix}}} & {{formula}\quad (10)}\end{matrix}$

On the other hand, the matrix operation of the one-dimensional inverseDCT in the case where N=8 is represented by $\begin{matrix}{\text{**N=8**}\quad} & \quad \\{\begin{pmatrix}{x0} \\{x1} \\{x2} \\{x3} \\{x4} \\{x5} \\{x6} \\{x7}\end{pmatrix} = {\begin{pmatrix}0.353553 & 0.490393 & 0.461940 & 0.415735 & 0.353553 & 0.277785 & 0.191342 & 0.097545 \\0.353553 & 0.415735 & 0.191342 & {- 0.097545} & {- 0.353553} & {- 0.490393} & {- 0.461940} & {- 0.277785} \\0.353553 & 0.277785 & {- 0.191342} & {- 0.490393} & {- 0.353553} & 0.097545 & 0.461940 & 0.415735 \\0.353553 & 0.097545 & {- 0.461940} & {- 0.277785} & 0.353553 & 0.415735 & {- 0.191342} & {- 0.490393} \\0.353553 & {- 0.097545} & {- 0.461940} & 0.277785 & 0.353553 & {- 0.415735} & {- 0.191342} & 0.490393 \\0.353553 & {- 0.277785} & {- 0.191342} & 0.490393 & {- 0.353553} & {- 0.097545} & 0.461940 & {- 0.415735} \\0.353553 & {- 0.415735} & 0.191342 & 0.097545 & {- 0.353553} & 0.490393 & {- 0.461940} & 0.277785 \\0.353553 & {- 0.490393} & 0.461940 & {- 0.415735} & 0.353553 & {- 0.277785} & 0.191342 & {- 0.097545}\end{pmatrix}\begin{pmatrix}{X0} \\{X1} \\{X2} \\{X3} \\{X4} \\{X5} \\{X6} \\{X7}\end{pmatrix}}} & {{{formula}\quad (11)}\quad}\end{matrix}$

When N=7, N=6, N=5, N=4, N=3, and N=2, the matrix operations of theone-dimensional inverse DCT are represented by $\begin{matrix}{\text{**N=7**}\quad} & \quad \\{\begin{pmatrix}{x0} \\{x1} \\{x2} \\{x3} \\{x4} \\{x5} \\{x6}\end{pmatrix} = {\begin{pmatrix}0.377964 & 0.521121 & 0.481588 & 0.417907 & 0.333269 & 0.231921 & 0.118942 \\0.377964 & 0.417907 & 0.118942 & {- 0.231921} & {- 0.481588} & {- 0.521121} & {- 0.333269} \\0.377964 & 0.231921 & {- 0.333269} & {- 0.521121} & {- 0.118942} & 0.417907 & 0.481588 \\0.377964 & 0.000000 & {- 0.534522} & {- 0.000000} & 0.534522 & 0.000000 & {- 0.534522} \\0.377964 & {- 0.231921} & {- 0.333269} & 0.521121 & {- 0.118942} & {- 0.417907} & 0.481588 \\0.377964 & {- 0.417907} & 0.118942 & 0.231921 & {- 0.481588} & 0.521121 & {- 0.333269} \\0.377964 & {- 0.521121} & 0.481588 & {- 0.417907} & 0.333269 & {- 0.231921} & 0.118942\end{pmatrix}\begin{pmatrix}{X0} \\{X1} \\{X2} \\{X3} \\{X4} \\{X5} \\{X6}\end{pmatrix}}} & {{formula}\quad (12)} \\{\text{**N=6**}\quad} & \quad \\{\begin{pmatrix}{x0} \\{x1} \\{x2} \\{x3} \\{x4} \\{x5}\end{pmatrix} = {\begin{pmatrix}0.408248 & 0.557678 & 0.500000 & 0.408248 & 0.288675 & 0.149429 \\0.408248 & 0.408248 & 0.000000 & {- 0.408248} & {- 0.577350} & {- 0.408248} \\0.408248 & 0.149429 & {- 0.500000} & {- 0.408248} & 0.288675 & 0.557678 \\0.408248 & {- 0.149429} & {- 0.500000} & 0.408248 & 0.288675 & {- 0.577678} \\0.408248 & {- 0.408248} & {- 0.000000} & 0.408248 & {- 0.577350} & 0.408248 \\0.408248 & {- 0.557678} & 0.500000 & {- 0.408248} & 0.288675 & {- 0.149429}\end{pmatrix}\begin{pmatrix}{X0} \\{X1} \\{X2} \\{X3} \\{X4} \\{X5}\end{pmatrix}}} & {{formula}\quad (13)} \\{\text{**N=5**}\quad} & \quad \\{\begin{pmatrix}{x0} \\{x1} \\{x2} \\{x3} \\{x4}\end{pmatrix} = {\begin{pmatrix}0.447214 & 0.601501 & 0.511667 & 0.371748 & 0.195440 \\0.447214 & 0.371748 & {- 0.195440} & {- 0.601501} & {- 0.511667} \\0.447214 & 0.000000 & {- 0.632456} & {- 0.000000} & 0.632456 \\0.447214 & {- 0.371748} & {- 0.195440} & 0.601501 & {- 0.511667} \\0.447214 & {- 0.601501} & 0.511667 & {- 0.371748} & 0.195440\end{pmatrix}\begin{pmatrix}{X0} \\{X1} \\{X2} \\{X3} \\{X4}\end{pmatrix}}} & {{formula}\quad (14)} \\{\text{**N=4**}\quad} & \quad \\{\begin{pmatrix}{x0} \\{x1} \\{x2} \\{x3}\end{pmatrix} = {\begin{pmatrix}0.500000 & 0.635281 & 0.500000 & 0.270598 \\0.500000 & 0.270598 & {- 0.500000} & {- 0.635281} \\0.500000 & {- 0.270598} & {- 0.500000} & 0.635281 \\0.500000 & {- 0.635281} & 0.500000 & {- 0.270598}\end{pmatrix}\quad \begin{pmatrix}{X0} \\{X1} \\{X2} \\{X3}\end{pmatrix}}} & {{formula}\quad (15)} \\{\text{**N=3**}\quad} & \quad \\{\begin{pmatrix}{x0} \\{x1} \\{x2}\end{pmatrix} = {\begin{pmatrix}0.577350 & 0.707107 & 0.408248 \\0.577350 & 0.000000 & {- 0.816497} \\0.577350 & {- 0.707107} & 0.408248\end{pmatrix}\begin{pmatrix}{X0} \\{X1} \\{X2}\end{pmatrix}}} & {{formula}\quad (16)} \\{\text{**N=2**}\quad} & \quad \\{\begin{pmatrix}{x0} \\{x1}\end{pmatrix} = {\begin{pmatrix}0.707107 & 0.707107 \\0.707107 & {- 0.707107}\end{pmatrix}\begin{pmatrix}{X0} \\{X1}\end{pmatrix}}} & {{formula}\quad (17)}\end{matrix}$

FIG. 8 is a block diagram for explaining an example of a conventionalDCT processor, illustrating the structure of the DCT processor. In FIG.8, a DCT processor 1 comprises 8-bit input registers 2 a, 2 b, 2 c, 2 d,2 e, 2 f, 2 g and 2 h for latching input image data; 8-bit holdingregisters 3 a, 3 b, 3 c, 3 d, 3 e, 3 f, 3 g and 3 h for latching theoutput data from the respective input registers 2 a, 2 b, 2 c, 2 d, 2 e,2 f, 2 g and 2 h and, thereafter, shift-outputting the data, bit by bit,from the least significant bit (hereinafter referred to as “LSB”) ofeach output data; ROM accumulators (hereinafter referred to as “RAC”) 4a, 4 b, 4 c, 4 d, 4 e, 4 f, 4 g and 4 h for accumulating the data inROMs (Read Only Memories) 41 a˜41 h by accumulators 42 a˜42 h, with theoutput data from the respective holding registers 3 a, 3 b, 3 c, 3 d, 3e, 3 g, 3 g and 3 h as 8-bit addresses; and output registers 5 a, 5 b, 5c, 5 d, 5 e, 5 f, 5 g and 5 h for latching the output data from therespective RACs 4 a, 4 b, 4 c, 4 d, 4 e, 4 f, 4 g and 4 h and outputtingthese data.

Further, the respective RACs 4 a, 4 b, 4 c, 4 d, 4 e, 4 f, 4 g and 4 hcomprise ROMs 41 a˜41 h each having a table of 2⁸ data including the sumof the products obtained by multiplying the column coefficients in thematrix operation by the respective bits of the pixel data constitutingthe input column or row; and accumulators 42 a˜42 h for accumulating theoutputs from the respective ROMs 41 a˜41 h.

The conventional DCT processor employs the DA (Distributed Arithmetic)method for the matrix operation. This DA method is efficient for theproduct-sum operation of fixed coefficients. In this method, theproduct-sum operation between each input pixel data and the fixedcoefficients is performed not in word units but in bit string units. Abit string comprising the bits of each input pixel data is used as anaddress, and the partial product corresponding to this address is readfrom the ROM which stores the partial products as a table, and the bitsfrom the LSB (Least Significant Bit) to the MSB (Most Significant Bit)are accumulated to realize the product-sum operation of fixedcoefficients. In this DCT processor, the partial products obtained bymultiplying the bit strings constituted by the respective bits of theinput pixel data of N or M pixels by the row coefficients of the DCTcoefficients are stored as tables in the respective ROMs 41 a˜41 h ofthe RACs 4 a˜4 h, in association with the respective row coefficients ofthe DCT coefficients. By inputting the bit strings of N or M bitsconstituted by the respective bits of the input pixel data of N or Mpixels as addresses in the respective ROMs 41 a˜41 h, the partialproducts are output from the ROMs 41 a˜41 h, and these partial productsare sequentially output from the LSB to the MSB of the respective pixeldata, and accumulated, whereby the result of the one-dimensional DCT isobtained.

Next, the operation will be described.

Initially, the input register 2 a latches 8-bit input pixel data, andshift operation is performed in every input cycle, from the inputregister 2 a to the input register 2 b, from the input register 2 b tothe input register 2 c, . . . , until the pixel data is latched by allof the input registers 2 a, 2 b, 2 c, 2 d, 2 e, 2 f, 2 g, and 2 h.Thereafter, the input registers 2 a, 2 b, 2 c, 2 d, 2 e, 2 f, 2 g, and 2h output the latched pixel data to the corresponding holding registers 3a, 3 b, 3 c, 3 d, 3 e, 3 f, 3 g, and 3 h. In parallel with the next 8pieces of input pixel data being latched by the input registers 2 a, 2b, 2 c, 2 d, 2 e, 2 f, 2 g, and 2 h, the holding registers 3 a, 3 b, 3c, 3 d, 3 e, 3 f, 3 g, and 3 h output the latched 8-bit pixel data, bitby bit, from the LSB. With the 8-bit data outputted from the holdingregisters 3 a, 3 b, 3 c, 3 d, 3 e, 3 f, 3 g, and 3 h as addresses, theROMs 41 a, 41 b, 41 c, 41 d, 41 e, 41 f, 41 g, and 41 h output ROM datacorresponding to these addresses. The accumulators 42 a, 42 b, 42 c, 42d, 42 e, 42 f, 42 g, and 42 h latch the 8-bit ROM data outputted fromthe corresponding ROMs 41 a, 41 b, 41 x, 41 d, 41 e, 41 f, 41 g, and 41h, and output 8-bit data. The output registers 5 a, 5 b, 5 c, 5 d, 5 e,5 f, 5 g, and 5 h corresponding to the RACs 4 a, 4 b, 4 c, 4 d, 4 e, 4f, 4 g, and 4 h latch the data outputted from the accumulators 42 a, 42b, 42 c, 42 d, 42 e, 42 f, 42 g, and 42 h, respectively, and performsequential shift operation from the output register 5 h to the outputregister 5 g, from the output register 5 g to the output register 5 f, .. . , to output the latched data.

When performing two-dimensional DCT on pixel data in a unit blockcomprising 8×8 pixels by using the DCT processor 1, initially, a seriesof operations are performed eight times for every eight pieces of pixeldata in the column direction to obtain 64 interim results and,thereafter, one-dimensional DCT is performed on the 64 interim resultsin the row direction.

However, in order to perform one-dimensional DCT or inverse DCT on aunit block comprising 8×8 pixels, the conventional DCT processor shouldhave 8 tables each containing 256 pieces of ROM data when each inputpixel data has 8 bits. When both of DCT and inverse DCT are performed byone DCT processor, the processor should have 8 tables each having 512pieces of ROM data. Further, in recent years, there is a demand forvariable size of a unit block, depending on the standard of video datacompression. However, the above-described DCT processor is applicable toonly pixel data having 8×8 pixels as a unit block. So, in order toprocess pixel data having, as a unit block, 7×7 pixels, 6×6 pixels, 5×5pixels, and 4×4 pixels, DCT processors having 7 tables of 256 pieces ofROM data, six tables of 128 pieces of ROM data, five tables of 64 piecesof ROM data, and four tables of 32 pieces of ROM data are required,respectively. Accordingly, in order to perform DCT and inverse DCT onpixel data in unit blocks each comprising arbitrarily selected N×Mpixels, a plurality of DCT processors are required, whereby the circuitscale is considerably increased.

The present invention is made to solve the above-described problems andit is an object of the present invention to provide a DCT processorhaving a relatively small circuit scale, which can perform DCT orinverse DCT on image data in unit blocks having different sizes.

DISCLOSURE OF THE INVENTION

The present invention is a DCT processor performing one-dimensional DCToperation or one-dimensional inverse DCT operation on pixel data ofimage data in unit blocks each comprising N×M pixels (N,M: arbitraryintegers from 1 to 8). This DCT processor comprises: bit slice means forreceiving the pixel data of the image data in each N×M unit block foreach row or column, and slicing, bit by bit, the respective pixel dataconstituting the input rows or columns, and outputting the sliced pixeldata; control means for outputting a control signal which includes thenumber of input pixel data that is the number of pixel data constitutingeach input row or column, and a value indicating that either the DCToperation or the inverse DCT operation is to be performed; firstbutterfly operation means for subjecting the output data from the bitslice means to the butterfly operation and outputting the result of thebutterfly operation in the case where the control signal outputted fromthe control means indicates that the number of input pixel data is apower of 2 and that the DCT operation is to be performed, and in thecases other than mentioned above, the first butterfly operation meansperforming no butterfly operation and outputting the output data of thebit slice means as it is; address generation means for generatingaddresses on the basis of bit strings obtained from the output data ofthe first butterfly operation means, and the number of input pixel dataand the value indicating that either the DCT operation or the inverseDCT operation is to be performed, which are included in the controlsignal; operation means having eight sets of multiplication resultoutput means and accumulation means, the multiplication result outputmeans outputting the results of multiplication to be used for obtainingthe results of the one-dimensional DCT and inverse DCT operations, inaccordance with the above-described addresses, and the accumulationmeans accumulating the output data from the multiplication result outputmeans and outputting the accumulated data; and second butterflyoperation means for subjecting the output data from the operation meansto the butterfly operation and outputting the result of the butterflyoperation after rearranging it according to the order of input pixeldata in the case where the control signal outputted from the controlmeans indicates that the number of input pixel data is a power of 2 andthat the inverse DCT operation is to be performed, and in the casesother than mentioned above, the second butterfly operation meansperforming no butterfly operation and outputting the output data of theoperation means after rearranging it according to the order of inputpixel data. Therefore, the quantity of data to be the result ofmultiplication used for obtaining the result of DCT operation and theresult of inverse DCT operation is reduced, whereby the data capacity ofthe multiplication result output means for outputting this data isreduced, resulting in a DCT processor having reduced circuit scale.

Further, in the present invention, on the basis of the output data fromthe first butterfly operation means, and the number of input pixel data,and the value indicating that either the DCT operation or the inverseDCT operation is to be performed, the address generation means generatesaddresses as follows. When the control signal indicates that the numberof input pixel data is any of 7, 6, 5, and 3, the address generationmeans generates an address by adding a header address of 2 bits, 3 bits,4 bits, or 6 bits which indicates the value of the number of input pixeldata including the value indicating either the DCT operation or theinverse DCT operation, to a bit string of 7 bits, 6 bits, 5 bits, or 3bits which is constituted based on the output data from the firstbutterfly operation means, respectively. When the control signalindicates that the number of input pixel data is any of 8, 4, and 2 andthe DCT operation is to be performed, the address generation meansgenerates an address by adding a header address of 5 bits, 7 bits, or 8bits which indicates the value of the number of input pixel dataincluding the value indicating that the DCT operation is to beperformed, to a bit string of 4 bits, 2 bits, or 1 bit which isconstituted based on the result of addition obtained in the butterflyoperation by the butterfly operation means, and to a bit string of 4bits, 2 bits, or 1 bit which is constituted based on the result ofsubtraction obtained in the butterfly operation, respectively. When thecontrol signal indicates that the number of input pixel data is any of8, 4, and 2 and the inverse DCT operation is to be performed, theaddress generation means generates an address by adding a header addressof 5 bits, 7 bits, or 8 bits which indicates the value of the number ofinput pixel data including the value indicating that the inverse DCToperation is to be performed, to a bit string of 4 bits, 2 bits, or 1bit which is constituted based on the output of 8 bits, 4 bits, or 2bits from the first butterfly operation means, respectively. The headeraddresses are bit strings which permit all of the addresses obtained byadding the header addresses to the addresses based on the output datafrom the first butterfly operation means, to become continuousaddresses. Therefore, the multiplication result output means can beefficiently mapped so that no useless area is generated in themultiplication result output means, and thus the size of themultiplication result output means is reduced, whereby the circuit scaleof the DCT processor is further reduced.

Further, in the present invention, the multiplication result outputmeans outputs the results of multiplication as follows. When the controlsignal outputted from the control means indicates that the number ofinput pixel data is a power of 2 and the DCT operation is to beperformed, the multiplication result output means outputs the result ofmultiplication with respect to the bit strings obtained from the outputdata of the first butterfly operation means, according to the DCT matrixoperation using fast Fourier transform. When the control signaloutputted from the control means indicates that the number of inputpixel data is a value other than a power of 2 and the DCT operation isto be performed, the multiplication result output means outputs theresult of multiplication with respect to the bit strings obtained fromthe output data of the first butterfly operation means, according to theDCT matrix operation. When the control signal outputted from the controlmeans indicates that the number of input pixel data is a power of 2 andthe inverse DCT operation is to be performed, the multiplication resultoutput means outputs the result of multiplication with respect to thebit strings obtained from the output data of the first butterflyoperation means, according to the inverse DCT matrix operation usingfast Fourier transform. When the control signal outputted from thecontrol means indicates that the number of input pixel data is a valueother than a power of 2 and the inverse DCT operation is to beperformed, the multiplication result output means outputs the result ofmultiplication with respect to the bit strings obtained from the outputdata of the first butterfly operation means, according to the inverseDCT matrix operation. Therefore, the circuit scale of the DCT processoris further reduced.

Further, in the present invention, when the control signal indicatesthat the number of input pixel data is a value other than 8, theoperation of means which is not used for the operation is halted.Therefore, the power consumption is reduced.

Further, the present invention is a DCT processor performingone-dimensional DCT operation on pixel data of image data in unit blockseach comprising N×M pixels (N,M: arbitrary integers not less than 1).This DCT processor comprises: bit slice means for receiving the pixeldata of the image data in each N×M unit block for each row or column,and slicing, bit by bit, the respective pixel data constituting theinput rows or columns, and outputting the sliced pixel data; controlmeans for outputting a control signal which indicates the number ofinput pixel data that is the number of pixel data constituting eachinput row or column; butterfly operation means for performing butterflyoperation on the output data from the bit slice means and outputting theresult of the butterfly operation in the case where the control signaloutputted from the control means indicates that the number of inputpixel data is a power of 2, and in the cases other than mentioned above,the butterfly operation means performing no butterfly operation andoutputting the output data of the bit slice means as it is; addressgeneration means for generating. addresses by using bit strings obtainedfrom the output data of the first butterfly operation means, and thenumber of input pixel data included in the control signal; operationmeans having plural sets of multiplication result output means andaccumulation circuits, as many as the maximum value of the number ofinput pixel data, the multiplication result output means outputting theresults of multiplication to be used for obtaining the result ofone-dimensional DCT operation,. in accordance with the above-describedaddresses, and the accumulation circuits accumulating the results ofmultiplication outputted from the respective multiplication resultoutput means and outputting the accumulated results; and output meansfor rearranging the output data of the operation means according to theorder of input pixel data, and outputting the rearranged data as theresult of one-dimensional DCT operation. Therefore, the quantity of datato be the result of multiplication used for obtaining the result of DCToperation is reduced, whereby the data capacity of the multiplicationresult output means for outputting this data is reduced, resulting in aDCT processor having reduced circuit scale.

Further, in the present invention, on the basis of the output data fromthe first butterfly operation means and the number of input pixel data,the address generation means generates addresses as follows. When thecontrol signal indicates that the number of input pixel data is a valueother than a power of 2, the address generation means generates anaddress by adding a header address for indicating the number of inputpixel data, to an address having the number of bits equal to the numberof input pixel data, which is constituted based on the output data fromthe first butterfly operation means. When the control signal indicatesthat the number of input pixel data is a power of 2, the addressgeneration means generates an address by adding a header address forindicating the number of input pixel data, to a bit string having thenumber of bits equal to half of the number of input pixel data, which isconstituted based on the result of the addition obtained in thebutterfly operation by the first butterfly operation means, and to a bitstring having the number of bits equal to half of the number of inputpixel data, which is constituted based on the result of the subtractionobtained in the butterfly operation. The header addresses are bitstrings which permit all of the addresses obtained by adding the headeraddresses to the addresses based on the output data from the firstbutterfly operation means, to become continuous addresses and have thenumber of bits equal to the maximum value of the number of input pixeldata. Therefore, the multiplication result output means can beefficiently mapped so that no useless area is generated in themultiplication result output means, and thus the size of themultiplication result output means is reduced, whereby the circuit scaleof the DCT processor is further reduced.

Further, in the present invention, the butterfly operation meansperforms butterfly operation for outputting the values obtained bysequentially adding and subtracting the pixel data, which have beeninput for each row or column to the bit slice means and sliced bit bybit to be output, starting from the both ends of the input row or columntoward the inside. Therefore, the circuit scale of the DCT processor isfurther reduced.

Further, in the present invention, the multiplication result outputmeans outputs the result of multiplication as follows. When the controlsignal outputted from the control means indicates that the number ofinput pixel data is a power of 2, the multiplication result output meansoutputs the result of multiplication with respect to the bit stringsobtained from the output data of the first butterfly operation meansaccording to the DCT matrix operation using fast Fourier transform. Whenthe control signal indicates that the number of input pixel data is avalue other than a power of 2, the multiplication result output meansoutputs the result of multiplication with respect to the bit stringsobtained from the output data of the first butterfly operation meansaccording to the DCT matrix operation. Therefore, the circuit scale ofthe DCT processor is further reduced.

Further, the present invention is a DCT processor performingone-dimensional inverse DCT operation on pixel data of image data inunit blocks each comprising N×M pixels (N,M: arbitrary integers not lessthan 1). This DCT processor comprises: bit slice means for receiving thepixel data of the image data in each N×M unit block for each row orcolumn, and slicing, bit by bit, the respective pixel data constitutingthe input rows or columns, and outputting the sliced pixel data; controlmeans for outputting a control signal which includes the number of inputpixel data that is the number of pixel data constituting each input rowor column; address generation means for generating addresses using bitstrings obtained from the output data of the bit slice means, and thenumber of input pixel data included in the control signal; operationmeans having plural sets of multiplication result output means andaccumulation circuits, as many as the maximum value of the number ofinput pixel data, the multiplication result output means outputting theresults of multiplication to be used for obtaining the result ofone-dimensional DCT operation in accordance with the above-describedaddresses, and the accumulation circuits accumulating the results ofmultiplication outputted from the respective multiplication resultoutput means and outputting the accumulated results; and butterflyoperation means for performing butterfly operation on the output datafrom the operation means and outputting the result of the butterflyoperation after rearranging it according to the order of input pixeldata in the case where the control signal outputted from the controlmeans indicates that the number of input pixel data is a power of 2, andin the cases other than mentioned above, the butterfly operation meansperforming no butterfly operation and outputting the output data of theoperation means after rearranging it according to the order of inputpixel data. Therefore, the quantity of data to be the result ofmultiplication used for obtaining the result of inverse DCT operation isreduced, whereby the data capacity of the multiplication result outputmeans for outputting this data is reduced, resulting in a DCT processorhaving reduced circuit scale.

Further, in the present invention, on the basis of the output data fromthe bit slice means and the number of input pixel data, the addressgeneration means generates addresses as follows. When the control signalindicates that the number of input pixel data is a value other than apower of 2, the address generation means generates an address by addinga header address for indicating the number of input pixel data, to anaddress having the number of bits equal to the number of input pixeldata, which is constituted based on the output data of the bit slicemeans. When the control signal indicates that the number of input pixeldata is a power of 2, the address generation means generates an addressby adding a header address for indicating the number of input pixeldata, to a bit string having the number of bits equal to half of thenumber of input pixel data, which is constituted based on the outputdata from the bit slice means. The header addresses are bit stringswhich permit all of the addresses obtained by adding the headeraddresses to the addresses based on the output data of the bit slicemeans to become continuous addresses and have the number of bits equalto the maximum value of the number of input pixel data constituting theinput row or column. Therefore, the multiplication result output meanscan be efficiently mapped so that no useless area is generated in themultiplication result output means, and thus the size of themultiplication result output means is reduced, whereby the circuit scaleof the DCT processor is further reduced.

Further, in the present invention, the butterfly operation, meansperforms butterfly operation for outputting the value obtained byaddition and the value obtained by subtraction, which addition andsubtraction are performed between the value obtained by accumulating theresult of multiplication based on the odd-numbered pixel data amongstthe pixel data input for each row or column, and the value obtained byaccumulating the result of multiplication based on the even-numberedpixel data. Therefore, the circuit scale of the DCT processor is furtherreduced.

Further, in the present invention, the multiplication result outputmeans outputs the result of multiplication as follows. When the controlsignal outputted from the control means indicates that the number ofinput pixel data is a power of 2, the multiplication result output meansoutputs the result of multiplication with respect to the bit stringsobtained from the output data of the first butterfly operation meansaccording to the inverse DCT matrix operation using fast Fouriertransform. When the control signal indicates that the number of inputpixel data is a value other than a power of 2, the multiplication resultoutput means outputs the result of multiplication with respect to thebit strings obtained from the output data of the first butterflyoperation means according to the inverse DCT matrix operation.Therefore, the circuit scale of the DCT processor is further reduced.

Further, in the present invention, the unit block of the image data tobe input to the bit slice means is a unit block each comprising N×Mpixels (N,M: arbitrary integers from 1 to 8); and the operation meansincludes eight sets of multiplication result output means andaccumulation means, which is equal to the maximum value of the number ofinput pixel data. Therefore, the circuit scale of the DCT processor isfurther reduced.

Further, in the present invention, the bit slice means receives 16-bitdata as each pixel data to be input, slices this 16-bit data for everytwo bits, and outputs the sliced data; and the operation means isprovided with, as each of the multiplication result output means, twomultiplication result output units placed in parallel with each other,each outputting the result of multiplication, and data obtained byadding the outputs of the two multiplication result output units isaccumulated by the corresponding accumulation means. Therefore, thecircuit scale of the DCT processor is further reduced when the inputpixel data is 16-bit data.

Further, in the present invention, when the control signal indicatesthat the number of input pixel data is equal to a value other than themaximum value of the number of input pixel data, the operation of meansto be unused is halted. Therefore, the power consumption is reduced.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram illustrating the structure of a DCT processoraccording to a first embodiment of the present invention;

FIG. 2 is a diagram illustrating the internal structure of a firstbutterfly operation circuit according to the first embodiment of thepresent invention;

FIG. 3 is a diagram illustrating the internal structure of a secondbutterfly operation circuit according to the first embodiment of thepresent invention;

FIG. 4 is a map of ROM addresses according to the first embodiment ofthe present invention;

FIG. 5 is a map of data capacities used by ROMs according to the firstembodiment of the present invention;

FIGS. 6(a)-6(d) are a schematic diagram for explaining DCT for a blockof 8×7 pixels according to the first embodiment of the presentinvention;

FIGS. 7(a)-7(d) are a schematic diagram for explaining inverse DCT for ablock of 6×4 pixels according to the first embodiment of the presentinvention;

FIG. 8 is a block diagram illustrating the structure of a conventionalDCT processor;

FIG. 9 is a block diagram illustrating the structure of a DCT processoraccording to a fourth embodiment of the present invention;

FIG. 10 is a block diagram illustrating the structure of a DCT processoraccording to the fourth embodiment of the present invention; and

FIG. 11 is a block diagram illustrating the structure of a DCT processoraccording to the fourth embodiment of the present invention.

BEST MODE TO EXECUTE THE INVENTION

Embodiment 1.

A DCT processor according to a first embodiment receives pixel data ofunit blocks each comprising N×M (N,M: arbitrary integers from 1 to 8)pieces of pixels, for every column or row, and performs DCT or inverseDCT operation on these pixel data. This DCT processor utilizes frequencysampling type fast Fourier transform (hereinafter, referred to as “FFT”)especially when N or M is a power of 2, i.e., when N=8, N=4, or N=2.

Using FFT, the above-described DCT matrix operation when N=8 isrepresented by $\begin{matrix}{\text{**N=8**}\quad} & \quad \\{{\begin{pmatrix}{X0} \\{X2} \\{X4} \\{X6}\end{pmatrix} = {\begin{pmatrix}0.353553 & 0.353553 & 0.353553 & 0.353553 \\0.461940 & 0.191342 & {- 0.191342} & {- 0.461940} \\0.353553 & {- 0.353553} & {- 0.353553} & 0.353553 \\0.191342 & {- 0.461940} & 0.461940 & {- 0.191342}\end{pmatrix}\begin{pmatrix}{{x0} + {x7}} \\{{x1} + {x6}} \\{{x2} + {x5}} \\{{x3} + {x4}}\end{pmatrix}}}{\begin{pmatrix}{X1} \\{X3} \\{X5} \\{X7}\end{pmatrix} = {\begin{pmatrix}0.490393 & 0.415735 & 0.277785 & 0.097545 \\0.415735 & {- 0.097545} & {- 0.490393} & {- 0.277785} \\0.277785 & {- 0.490393} & 0.097545 & 0.415735 \\0.097545 & {- 0.277785} & 0.415735 & {- 0.490393}\end{pmatrix}\begin{pmatrix}{{x0} - {x7}} \\{{x1} - {x6}} \\{{x2} - {x5}} \\{{x3} - {x4}}\end{pmatrix}}}{{{when}\quad N} = 4}} & {{formula}\quad (18)} \\{\text{**N=4**}\quad} & \quad \\{{\begin{pmatrix}{X0} \\{X2}\end{pmatrix} = {\begin{pmatrix}0.500000 & 0.500000 \\0.500000 & {- 0.500000}\end{pmatrix}\begin{pmatrix}{{x0} + {x3}} \\{{x1} + {x2}}\end{pmatrix}}}{\begin{pmatrix}{X1} \\{X3}\end{pmatrix} = {\begin{pmatrix}0.653281 & 0.270598 \\0.270598 & {- 0.653281}\end{pmatrix}\begin{pmatrix}{{x0} - {x3}} \\{{x1} - {x2}}\end{pmatrix}}}{{{when}\quad N} = 2}} & {{formula}\quad (19)} \\{\text{**N=2**}\quad} & \quad \\{\begin{pmatrix}{X0} \\{X1}\end{pmatrix} = {\begin{pmatrix}0.707107 & 0.707107\end{pmatrix}\begin{pmatrix}{{x0} + {x1}} \\{{x0} - {x1}}\end{pmatrix}}} & {{formula}\quad (20)}\end{matrix}$

On the other hand, the inverse DCT matrix operation when N=8 isrepresented by $\begin{matrix}{\text{**N=8**}\quad} & \quad \\{{\begin{pmatrix}{x0} \\{x1} \\{x2} \\{x3}\end{pmatrix} = {{\begin{pmatrix}0.353553 & 0.461940 & 0.353553 & 0.191342 \\0.353553 & 0.191342 & {- 0.353553} & {- 0.461940} \\0.353553 & {- 0.191342} & {- 0.353553} & 0.461940 \\0.353553 & {- 0.461940} & 0.353553 & {- 0.191342}\end{pmatrix}\quad \begin{pmatrix}{X0} \\{X2} \\{X4} \\{X6}\end{pmatrix}\quad \begin{pmatrix}{x7} \\{x1} \\{x2} \\{x3}\end{pmatrix}} = {{\begin{pmatrix}0.353553 & 0.461940 & 0.353553 & 0.191342 \\0.353553 & 0.191342 & {- 0.353553} & {- 0.461940} \\0.353553 & {- 0.191342} & {- 0.353553} & 0.461940 \\0.353553 & {- 0.461940} & 0.353553 & 0.191342\end{pmatrix}\begin{pmatrix}{X0} \\{X2} \\{X4} \\{X6}\end{pmatrix}} + {\begin{pmatrix}0.490393 & 0.415735 & 0.277785 & 0.097545 \\0.415735 & {- 0.097545} & {- 0.490393} & {- 0.277785} \\0.277785 & {- 0.490393} & 0.097545 & 0.415735 \\0.097545 & {- 0.277785} & 0.415735 & {- 0.490393}\end{pmatrix}\begin{pmatrix}{X1} \\{X3} \\{X5} \\{X7}\end{pmatrix}} - {\begin{pmatrix}0.490393 & 0.415735 & 0.277785 & 0.097545 \\0.415735 & {- 0.097545} & {- 0.490393} & {- 0.277785} \\0.277785 & {- 0.490393} & 0.097545 & 0.415735 \\0.097545 & {- 0.277785} & 0.415735 & {- 0.490393}\end{pmatrix}\begin{pmatrix}{X1} \\{X3} \\{X5} \\{X7}\end{pmatrix}}}}}{{when} = \quad 4}} & {{formula}\quad (21)} \\{\text{**N=4**}\quad} & \quad \\{{\begin{pmatrix}{x0} \\{x1}\end{pmatrix} = {{\begin{pmatrix}0.500000 & 0.500000 \\0.500000 & {- 0.500000}\end{pmatrix}\begin{pmatrix}{X0} \\{X2}\end{pmatrix}} + {\begin{pmatrix}0.653281 & 0.270598 \\0.270598 & {- 0.653281}\end{pmatrix}\begin{pmatrix}{X1} \\{X3}\end{pmatrix}}}}{\begin{pmatrix}{x3} \\{x2}\end{pmatrix} = {{\begin{pmatrix}0.500000 & 0.500000 \\0.500000 & {- 0.500000}\end{pmatrix}\begin{pmatrix}{X0} \\{X2}\end{pmatrix}} - {\begin{pmatrix}0.653281 & 0.270598 \\0.270598 & {- 0.653281}\end{pmatrix}\begin{pmatrix}{X1} \\{X3}\end{pmatrix}}}}{{{when}\quad N} = 2}} & {{formula}\quad (22)} \\{\text{**N=2**}\quad} & \quad \\{{({x0}) = {{(0.707107)({X0})} + {(0.707107)({X1})}}}{({x1}) = {{(0.707107)({X0})} - {(0.707107)({X1})}}}} & {{formula}\quad (23)}\end{matrix}$

As can be seen from these formulae, the complexity of each matrixoperation is significantly reduced by using FFT.

In this first embodiment, the use of so-called butterfly operation makesit possible to apply FFT to the matrix expression of DCT or inverse DCT,whereby DCT or inverse DCT operation is executed with less operationalcomplexity.

Hereinafter, the structure of the DCT processor will be described.

FIG. 1 is a block diagram illustrating the structure of the DCTprocessor according to the first embodiment. In FIG. 1, the DCTprocessor 100 comprises a control circuit 101 which outputs a signalindicating the number N or M of pixel data constituting a unit block ofN×M pixels; a bit slice circuit 102 which performs shift output, bit bybit, from the LSB of each 8-bit pixel data; a first butterfly operationcircuit 103 which performs butterfly operation on the output from thebit slice circuit 102; a ROM address generation circuit 104 whichgenerates ROM addresses on the basis of the output from the firstbutterfly operation circuit 103; a RAC 105 which reads ROM datacorresponding to the ROM addresses and accumulates them; and a secondbutterfly operation circuit 106 which performs butterfly operation onthe output from the RAC 105.

Further, the RAC 105 comprises ROM0, ROM1, ROM2, ROM3, ROM4, ROM5, ROM6,and ROM7 for performing DCT and inverse DCT operations; and accumulationcircuits 51 a, 51 b, 51 c, 51 d, 51 e, 51 f, 51 g, and 51 h foraccumulating the outputs from the respective ROMs. In this firstembodiment, the ROM0˜ROM7 are plural ROM areas in a single ROM.

This DCT processor employs the DA method for matrix operation, and theresult of multiplication with respect to the bit strings obtained bytaking the data, bit by bit, from the outputs of the first butterflyoperation circuit according to the DCT matrix operation, the inverse DCTmatrix operation, the DCT matrix operation using FFT, and the inverseDCT matrix operation using FFT, is stored as tables in the ROM0˜ROM7. Byinputting the bit strings obtained from the outputs of the firstbutterfly operation circuit 103 as addresses to the respective ROMs, theresult of multiplication is output from each ROM, and this result ofmultiplication is sequentially output from the LSB to the MSB of eachpixel data to be accumulated by the accumulation circuits 51 a˜51 h,thereby obtaining the result of matrix operation.

The ROM0˜ROM7 contain, as the result of multiplication in the case wherethe number N or M of pixel data is a power of 2, the result ofmultiplication with respect to the bit strings obtained from the outputsof the first butterfly operation circuit 103 according to the DCT andinverse DCT matrix operations using FFT. Further, these ROMs contain, asthe result of multiplication in the case where the number N or M is nota power of 2, the result of multiplication with respect to the bitstrings obtained from the outputs of the first butterfly operationcircuit 103 according to the normal DCT and inverse DCT matrixoperations using no FFT.

To be specific, the partial products calculated between the bit stringsobtained from the bit-unit outputs of the first butterfly operationcircuit 103 and the coefficients of the matrix operations represented byformulae (5)˜(7), (9), (12)˜(14), (16), (18)˜(23) are stored as tablesin the ROM0˜ROM7.

Since the value of N or M indicating the number of input pixel data isvariable, the number of the sets of ROMs and accumulators is eight thatis the maximum value of N or M. Further, in this first embodiment, sincethe value of N or M is variable and the coefficients used for the DCTand inverse DCT operations are also variable, the partial productsaccording to the case where the value of N or M varies are separatelystored in the ROM0˜ROM7.

FIG. 2 is a block diagram illustrating an example of the internalstructure of the first butterfly operation circuit 103. Thisfirst-butterfly operation circuit 103 performs butterfly operation whena control signal conducts the DCT operation and N or M indicating thenumber of pixel data is a power of 2, i.e., 2, 4, or 8. In the casesother than mentioned above, the circuit 103 outputs the data withoutperforming butterfly operation. The first butterfly operation circuit103 comprises data lines 30 a, 30 b, 30 c, 30 d, 30 e, 30 f, 30 g, and30 h which receive the bit signals of the respective pixel dataoutputted from the bit slice circuit 102; a first selection circuit 31 awhich selects the data line 30 h when the control signal indicates thatN or M is 8, selects the data line 30 d when the control signalindicates that N or M is 4, and selects the data line 30 d when thecontrol signal indicates that N or M is 2; a second selection circuit 31b which selects the data line 30 g when the control signal indicatesthat N or M is 8, and selects the data line 30 c when the control signalindicates that N or M is 4; a first addition circuit 32 a which adds thedata supplied from the data line 30 a and the first selection circuit 31a; a second addition circuit 32 b which adds the data supplied from thedata line 30 b and the second selection circuit 31 b; a third additioncircuit 32 c which adds the data supplied from the data line 30 c andthe data line 30 d; a fourth addition circuit 32 d which adds the datasupplied from the data line 30 d and the data line 30 e; a firstsubtraction circuit 33 a which performs subtraction of the signalssupplied from the data line 30 d and the data line 30 e; a secondsubtraction circuit 33 b which performs subtraction on the data suppliedfrom the data line 30 c and the data line 30 f; a third subtractioncircuit 33 b which performs subtraction on the data supplied from thedata line 30 b and the second selection circuit 31 b; and a fourthsubtraction circuit 33 d which performs subtraction on the data suppliedfrom the data line 30 a and the first selection circuit 31 a. This firstbutterfly operation circuit 104 performs the butterfly operation asfollows. That is, the pixel data, which have been input to the bit slicecircuit 102 for each row or column and then sliced bit by bit to beoutput, are sequentially added or subtracted to/from each other startingfrom the both ends of the input column or row toward the inside, andthus obtained values are output.

FIG. 3 is a block diagram illustrating an example of the internalstructure of the second butterfly operation circuit 106. This secondbutterfly operation circuit 106 performs butterfly operation when acontrol signal conducts the inverse DCT operation and N or M indicatingthe number of pixel data is a power of 2, i.e., 2, 4, or 8. In the casesother than mentioned above, it outputs the data without performingbutterfly operation. The second butterfly operation circuit 106comprises registers 60 a, 60 b, 60 c, 60 d, 60 e, 60 f, 60 g, and 60 hwhich latch the outputs from the accumulation circuits 51 a, 51 b, 51 c,51 d, 51 e, 51 f, 51 g, and 51 h included in the RAC 105, respectively;a register 61 a which latches the outputs from the registers 60 a, 60 c,60 e, and 60 g; a register 61 b which latches the outputs from theregisters 60 b, 60 d, 60 f, and 60 h; an adder 62 which adds the datasupplied from the register 61 a and the register 61 b; and a register 63which latches the output of the adder 62. This second butterflyoperation circuit 106 performs the butterfly operation as follows. Thatis, the result of arithmetic operation performed between theodd-numbered pixel data, amongst the pixel data which have been outputfrom the RAC 105 and input for each row or column, and the matrixcoefficients obtained by FFT is added to or subtracted from the resultof arithmetic performed between the even-numbered pixel data and thematrix coefficients obtained by FFT, and thus obtained values areoutput.

Next, the ROM addresses generated by the ROM address generation circuit104 will be described. This ROM address generation circuit 104 generatesROM addresses by adding header addresses to the bit strings constitutedby the outputs of the first butterfly operation circuit 103, and all ofthe addresses obtained by adding these header addresses becomecontinuous addresses. These header addresses are determined on the basisof the value of N or M and the value indicating that either the DCToperation or the inverse DCT operation is to be performed, which valuesare indicated by the control signal outputted from the control circuit101.

As shown in FIG. 4, in the DCT operation where the value of N or M is 7,a 7-bit signal based on signals A6, A5, A4, A3, A2, A1, and A0 which areoutput from the data lines 30 g, 30 f, 30 e, 30 d, 30 c, 30 b, and 30 aof the first butterfly operation circuit 103, respectively, is given 0as its upper bit A7 and, further, an 8-bit signal based on these signalsA7, A6, A5, A4, A3, A2, A1, and A0 is given 0 as a value indicating thatthe DCT operation is to be performed, and thus generated 9-bit signal isused as a ROM address.

Likewise, in the DCT operation where the value of N or M is 6, a 6-bitsignal based on the signals A5, A4, A3, A2, A1 and A0 which are outputfrom the data lines 30 f, 30 e, 30 d, 30 c, 30 b and 30 a of the firstbutterfly operation circuit 103 is given 1 and 0 as its upper bits A7and A6. When the value of N or M is 5, a 5-bit signal based on thesignals A4, A3, A2, A1 and A0 which are output from the data-lines 30 e,30 d, 30 c, 30 b and 30 a of the first butterfly operation circuit 103is given 1, 1, and 0 as its upper bits A7, A6, and A5. When the value ofN or M is 8, a 4-bit signal based on the signals A3, A2, A1, and A0which are output from the data lines 34 a, 34 b, 34 c and 34 d or thedata lines 34 e, 34 f, 34 g and 34 h of the first butterfly operationcircuit 103 is given 1, 1, 1, and 0 as its upper bits A7, A6, A5, andA4. When the value of N or M is 3, a 3-bit signal based on the signalsA2, A1, and A0 which are output from the data lines 30 c, 30 b, and 30 aof the first butterfly operation circuit 103 is given 1, 1, 1, 1, and 0as its upper bits A7, A6, A5, A4, and A3. When the value of N or M is 4,a 2-bit signal based on the signals A1 and A0 which are output from thedata lines 34 a and 34 b or the data lines 34 e and 34 f of the firstbutterfly operation circuit 103 is given 1, 1, 1, 1, 1, and 0 as itsupper bits A7, A6, A5, A4, A3, and A2. When the value of N or M is 2, a1-bit signal based on the signal A0 which is output from the data line34 a or the data line 34 e of the first butterfly operation circuit 103is given 1, 1, 1, 1, 1, 1, and 0 as its upper bits A7, A6, A5, A4, A3,A2, and A1. Further, each of the 8-bit signals comprising A7, A6, A5,A4, A3, A2, A1, and A0 is given 0 indicating the DCT operation, and thusgenerated 9-bit signal is used as a ROM address.

In the case of the inverse DCT operation, an 8-bit signal comprising A7,A6, A5, A4, A3, A2, A1, and A0 is given 1 as its MSB to generate a ROMaddress.

That is, in the inverse DCT operation where the value of N or M is 7, a7-bit signal based on the signals A6, A5, A4, A3, A2, A1, and A0 whichare output from the data lines 30 g, 30 f, 30 e, 30 d, 30 c, 30 b, and30 a of the first butterfly operation circuit 103 is given 0 as itsupper bit A7 and, further, an 8-bit signal comprising the signals A7,A6, A5, A4, A3, A2, A1, and A0 is given 1 indicating the inverse DCToperation, and thus generated 9-bit signal is used as a ROM address.

Likewise, in the inverse DCT operation where the value of N or M is 6, a6-bit signal based on the signals A5, A4, A3, A2, A1, and A0 which areoutput from the data lines 30 f, 30 e, 30 d, 30 c, 30 b, and 30 a of thefirst butterfly operation circuit 103 is given 1 and 0 as its upper bitsA7 and A6. When the value of N or M is 5, a 5-bit signal based on thesignals A4, A3, A2, A1,. and A0 which are output from the data lines 30e, 30 d, 30 c, 30 b, and 30 a of the first butterfly operation circuit103 is given 1, 1, 0 as its upper bits A7, A6, A5. When the value of Nor M is 8, a 4-bit signal based on the signals A3, A2, A1, and A0 whichare output from the data lines 30 g, 30 e, 30 c, and 30 a or the datalines 30 h, 30 f, 30 d, and 30 b of the first butterfly operationcircuit 103 is given 1, 1, 1, 0 as its upper bits A7, A6, A5, A4. Whenthe value of N or M is 3, a 3-bit signal constituted by the signals A2,A1, and A0 which are output from the data lines 30 c, 30 b, and 30 a ofthe first butterfly operation circuit 103 is given 1, 1, 1, 1, 0 as itsupper bits A7, A6, A5, A4, A3. When the value of N or M is 4, a 2-bitsignal constituted by the signals A1 and A0 which are output from thedata lines 30 c and 30 a or the data lines 30 d and 30 b of the firstbutterfly operation circuit 103 is given 1, 1, 1, 1, 1, 0 as its upperbits A7, A6, A5, A4, A3, A2. When the value of N or M is 2, a 1-bitsignal constituted by the signal A0 which is output from the data line30 a or 30 b of the first butterfly operation circuit 103 is given 1, 1,1, 1, 1, 1, 0 as its upper bits A7, A6, A5, A4, A3, A2, A1. Further,each of the 8-bit signals comprising A7, A6, A5, A4, A3, A2, A1, and A0is given 1 indicating the inverse DCT operation, and thus generated9-bit signal is used as a ROM address.

In the ROM address generation circuit 104, the number of address datapossessed by each ROM in the RAC 105 can be reduced to 512 by theabove-described address generation. However, it includes data sectionsof four unused addresses shown in FIG. 4.

FIG. 5 is a map of ROM data recorded in the respective ROMs of the RAC105. The results of multiplication for performing the DCT or inverse DCToperation in the case where N or M is 1˜8 are respectively recorded inthe ROM0, ROM1, ROM2, ROM3, ROM4, ROM5, ROM6, and ROM7 in associationwith the ROM addresses.

Hereinafter, the operation of the DCT processor 100 will be described.

The operation will be described on the assumption that the DCT processor100 receives 8×7 pixel data and performs the DCT operation.

Initially, the control circuit 101 outputs a signal indicating thenumber N or M of input pixel data, and either the DCT operation or theinverse DCT operation. In this case, the control signal indicates N=8,M=7, and DCT operation. Next, the bit slice circuit 102 outputs theeight pieces of pixel data in the column direction, bit by bit, from theLSB of each pixel data. The first butterfly operation circuit 103receives the signal indicating N=8, and performs the butterfly operationrepresented by formula (18). That is, the first selection circuit 31 aselects the data line 30 h, and the second selection circuit 31 bselects the data line 30 g. The first addition circuit 32 a adds thesignal supplied from the data line 30 h selected by the first selectioncircuit 31 a, and the signal supplied from the data line 30 a. Further,the second addition circuit 32 b adds the signal supplied from the dataline 30 g selected by the second selection circuit 31 b, and the signalsupplied from the data line 30 b. Further, the third addition circuit 32c adds the data supplied from the data line 30 c and the data line 30 d,and the fourth addition circuit 32 d adds the data supplied from thedata line 30 d and the data line 30 e.

On the other hand, the first subtraction circuit 33 a performssubtraction on the signals supplied from the data line 30 d and the dataline 30 e, and the second subtraction circuit 33 b performs subtractionon the data supplied from the data line 30 c and the data line 30 f.Further, the third subtraction circuit 33 b performs subtraction on thedata supplied from the data line 30 g selected by the second selectioncircuit 31 b and the data supplied from the data line 30 b, and thefourth subtraction circuit 33 d performs subtraction on the datasupplied from the data line 30 h selected by the first selection circuit31 a and the data supplied from the data line 30 a.

In this way, the first butterfly operation circuit 103 performs thebutterfly operation. This operation is equivalent to additions andsubtractions on the right side of formula (18), namely, x0+x7, x1+x6,x2+x5, x3+x4, x0−x7, x1−x6, x2−x5, and x3−x4.

The ROM address generation circuit 104 generates a ROM address signal onthe basis of the output from the first butterfly operation circuit 103,and outputs the signal. That is, the ROM address generation circuit 104adds 01110, as upper five bits, to a 4-bit signal constituted by signalsindicating x0+x7, x1+x6, x2+x5, x3+x4 in this order, thereby generatinga 9-bit ROM address. This ROM address is output to the ROM0, ROM2, ROM4,and ROM6 of the RAC 105. Further, the ROM address generation circuit 104adds 01110, as upper five bits, to a 4-bit signal constituted by signalsindicating x0−x7, x1−x6, x2−x5, and x3−x4 in this order, therebygenerating a 9-bit ROM address. This ROM address is output to the ROM1,ROM3, ROM5, and ROM7 of the RAC 105.

The ROM0, ROM1, ROM2, ROM3, ROM4, ROM5, ROM6, and ROM7 in the RAC 105output the data corresponding to the ROM addresses generated by the ROMaddress generation circuit 104, and the accumulation circuits 51 a, 51c, 51 c, 51 d, 51 e, 51 f, 51 g, and 51 h accumulate the outputs fromthe respective ROMs and output the results. Thereby, X0, X2, X4, X6, X1,X3, X5, and X7 shown in formula (18) are calculated.

The second butterfly operation circuit 106 outputs the data suppliedfrom the accumulation circuits 51 a, 51 b, 51 c, 51 d, 51 e, 51 f, 51 g,and 51 h of the RAC 105, as DCT-processed eight pieces of pixel data. Tobe specific, the registers 60 a, 60 b, 60 c, 60 d, 60 e, 60 f, 60 g, and60 h of the second butterfly operation circuit 106 latch the outputsignals from the accumulation circuits 51 a, 51 b, 51 c, 51 d, 51 e, 51f, 51 g, and 51 h, respectively, and output these signals in the orderas inputted.

In this way, a series of operations described above are repeated seventimes for every 8 pieces of pixel data which are input along the columndirection (FIG. 6(a)), whereby the interim results for 56 pieces ofpixel data are output to end the one-dimensional DCT operation (FIG.6(b)).

Next, the 56 pieces of interim results (FIG. 6(b)) are input to the DCTprocessor 100, for every seven pieces of pixel data along the rowdirection. In this case, the operation represented by formula (5) isexecuted. In like manner as described above, a series of operationsdescribed above are repeated eight times for every 7 pieces of pixeldata (FIG. 6(c)), thereby completing the two-dimensional DCT operationfor the 56 pieces of pixel data (FIG. 6(d)).

In this case, since the eighth pixel data does not exist, the bit slicecircuit 102 performs bit slicing on the seven pieces of input pixeldata, and the ROM7 and the accumulation circuit 51 h in the RAC 105 donot operate.

Next, a description will be given of the case where the DCT processor100 receives 6×4 pixel data, and performs the inverse DCT operation.

Initially, the control circuit 101 outputs a signal indicating thenumber N or M of input pixel data, and either the DCT operation or theinverse DCT operation. In this case, the control signal indicates N=6,M=4, and inverse DCT operation. Next, the bit slice circuit 102 outputsthe input six pieces of pixel data in the column direction, bit by bit,from the LSB of each pixel data. The first butterfly operation circuit103 receives the signal indicating N=6 and inverse DCT, and outputs theinput pixel data as it is without performing butterfly operation.

The ROM address generation circuit 104 generates a ROM address signal onthe basis of the output from the first butterfly operation circuit 103,and outputs the signal.

When N=6, the ROM address generation circuit 104 adds 110, as upperthree bits, to a 6-bit signal constituted by the signals A5, A4, A3, A2,A2, and A0 in this order, thereby generating a 9-bit ROM address. ThisROM address is output to the ROM0, ROM1, ROM2, ROM3, ROM4, and ROM5 inthe RAC 105.

The ROM1, ROM2, ROM3, ROM4, and ROM5 in the RAC 105 output the datacorresponding to the ROM address generated by the ROM address generationcircuit 104, and the accumulation circuits 51 a, 51 b, 51 c, 51 d, 51 e,and 51 f accumulate the outputs from the respective ROMs and output theresults. Thereby, X0, X1, X2, X3, X4, and X5 shown in formula (13) arecalculated.

In the RAC 105, the ROM6, the ROM7, and the accumulation circuits 51 gand 51 h do not operate because there are no corresponding input pixeldata.

The second butterfly operation circuit receives the control signalindicating N=6 and inverse DCT operation, and outputs the data suppliedfrom the accumulation circuits 51 a, 51 b, 51 c, 51 d, 51 e, and 51 f,as inverse-DCT-processed six pieces of pixel data. That is, theregisters 60 a, 60 b, 60 c, 60 d, 60 e, and 60 f of the second butterflyoperation circuit 106 latch the output signals from the accumulationcircuits 51 a, 51 b, 51 c, 51 d, 51 e, and 51 f of the RAC 105, andoutput these signals in the order as inputted.

In this way, a series of operations described above are repeated fourtimes for every six pieces of pixel data which are input along thecolumn direction (FIG. 7(a)), whereby the interim results for 24 piecesof pixel data are output to end the one-dimensional inverse DCToperation (FIG. 7(b)).

Next, the 24 pieces of interim results (FIG. 7(b)) are input to the DCTprocessor 100 for every four pieces of pixel data along the rowdirection. In like manner as described above, a series of operationsdescribed above are repeated six times for every four pieces of pixeldata (FIG. 7(c)), thereby completing the two-dimensional inverse DCToperation for the 24 pieces of pixel data (FIG. 7(d)).

More specifically, the control circuit 101 outputs a signal indicatingM=4 and inverse DCT operation. Next, the bit slice circuit 102 outputsthe input four pieces of pixel data along the row direction, bit by bit,starting from the LSB of each pixel data. The first butterfly operationcircuit 103 receives the signal indicating M=4 and inverse DCToperation, and outputs the bit-by-bit-sliced input pixel data as it iswithout performing butterfly operation.

The ROM address generation circuit 104 generates a ROM address signal onthe basis of the output from the first butterfly operation circuit 103,and outputs it. That is, 0111110 is added, as upper seven bits, to a2-bit signal which is output from the data lines 30 c and 30 a or thedata lines 30 d and 30 b, thereby generating a 9-bit ROM address. ThisROM address is output to the ROM0 and ROM2 or the ROM1 and ROM3 of theRAC 105.

The ROM0, ROM1, ROM2, and ROM3 in the RAC 105 output the datacorresponding to the ROM address generated by the ROM address generationcircuit 104, and the accumulation circuits 51 a, 51 b, 51 c, and 51 daccumulate the outputs from the respective ROMs, and output the results.

The second butterfly operation circuit receives the control signalindicating M=4 and inverse DCT operation, performs butterfly operationon the outputs from the accumulation circuits 51 a 51 b, 51 c, and 51 din the RAC 105, and outputs the results. That is, the outputs from theaccumulation circuits 51 a, 51 b, 51 c, and 51 d in the RAC 105 arelatched by the registers 60 a, 60 b, 60 c, and 60 d. The register 61 aperforms data latch four times in the following order: the output of theregister 61 a, the output of the register 60 c, the output of theregister 60 a, and the output of the register 60 c. On the other hand,the register 61 b performs data latch four times in the following order:the output of the register 60 b, the output of the register 60 d, theinverse output of the register 60 b, and the inverse output of theregister 60 d. The adder 62 sequentially adds the output from theregister 61 a and the output from the register 61 b. Thereby, x0, x1,x2, and x3 shown in formula (22) are calculated. The register 63sequentially latches the outputs from the adder 62, and outputs them.

As described above, according to the DCT processor of the firstembodiment, the butterfly operations performed by the first butterflyoperation circuit 103 and the second butterfly operation circuit 106 arecontrolled by the control signal which indicates the value of N or M andeither DCT operation or inverse DCT operation, and the ROM addressgeneration circuit 104 generates the ROM addresses corresponding to theDCT operation or the inverse DCT operation for the pixel data in N or Munit blocks, whereby both of the DCT operation and the inverse DCToperation for the pixel data in N×M unit blocks (N,M: arbitrary integersfrom 1 to 8) can be performed by a single DCT processor. Thereby, thecircuit scale of the DCT processor is reduced.

Further, the DCT processor is provided with the first butterflyoperation circuit 103 and the second butterfly operation circuit 106,and utilizes the butterfly operation when the value of N or M is a powerof 2. Therefore, the operational complexity can be reduced by utilizingFFT for the matrix operation, and the quantity of data as the result ofmultiplication used for obtaining the result of DCT operation and theresult of inverse DCT operation (i.e., the quantity of partial productsobtained by multiplying the bit strings of N or M bits constituted bythe respective bits of the input pixel data of N or M pixels by thecoefficients used for obtaining the result of DCT operation and theresult of inverse DCT operation) can be reduced, whereby the datacapacities of the ROMs for storing this data can be reduced, resultingin a DCT processor of reduced circuit scale.

Further, although the quantity of data to be stored in the ROM0˜ROM7 canbe reduced by using FFT, when the ROM0˜ROM7 are implemented as pluralregions in a single ROM, the reduced data sections should be put closeto reduce the capacity of the entire ROM. However, when the data, thenumber of which is reduced by FFT, is stored in the ROM by putting thereduced data sections close as they are, it becomes impossible to usethe respective bits of the input data as an address, which is one of thefeatures of the DA method. In this case, means for rearranging the bitstrings obtained from the respective bits of the input pixel data isneeded, resulting in complicated address generation. In contract withthis, according to the first embodiment, when the ROM address generationcircuit 104 generates ROM addresses by adding header addresses to thebit strings obtained from the first butterfly operation circuit 103, thecircuit 104 employs, as the header addresses, bit strings by which allof the generated addresses become continuous addresses. Thereby, thedata can be mapped in the ROM with efficiency such that the bit stringsfrom the first butterfly operation circuit 103 can be used as parts ofthe addresses and no useless area is generated in the ROM, whereby thecapacity of the ROM can be reduced. As the result, the circuit scale ofthe DCT processor is further reduced.

Embodiment 2.

FIG. 10 illustrates a DCT processor according to a second embodiment ofthe present invention, and in the figure the same reference numerals asthose shown in FIG. 1 designate the same or corresponding parts. A DCTprocessor 200 according to this second embodiment is identical to theDCT processor 100 of the first embodiment except that an output circuit206 which outputs the data from the operation means according to theorder of the input pixel data is provided instead of the secondbutterfly operation circuit, and only the DCT operation is performed. Inthe DCT processor 200, since the control signal outputted from thecontrol circuit 101 does not include the value indicating that eitherthe DCT operation or the inverse DCT operation is to be performed, it isnot necessary to include the value indicating either the DCT operationor the inverse DCT operation in the ROM address generated by the ROMaddress generation circuit 104 and, therefore, the ROM address is an8-bit address obtained by adding a header address indicating the valueof N or M to the bit-by-bit-sliced input pixel data. Further,ROM10˜ROM17 are obtained by removing the areas for storing the data usedfor the inverse DCT operation, from the ROM0˜ROM7. The operation of thisDCT processor 200 is identical to the operation for DCT of the DCTprocessor according to the first embodiment and, therefore, does notrequire repeated description.

Also in this second embodiment, as in the first embodiment, thebutterfly operation performed by the first butterfly operation circuit103 is controlled by the control signal indicating the value of N or M,and the ROM address generation circuit 104 generates the ROM addresscorresponding to the DCT operation for the pixel data in N or M unitblocks, on the basis of the output from the first butterfly operationcircuit 103. Thereby, the DCT operation for the pixel data in N×M unitblocks having the number of rows or columns being an arbitrary integerfrom 1 to 8 is performed by a single DCT processor, resulting in a DCTprocessor having reduced circuit scale.

Further, since the butterfly operation is used when the value of N or Mis a power of 2, the operational complexity can be reduced by utilizingFFT for the matrix operation, whereby the data capacity of the ROM forstoring the data to be the result of multiplication which is used forobtaining the result of DCT operation can be reduced, resulting in a DCTprocessor having reduced circuit scale.

Further, the data can be mapped in the ROM with efficiency such that thebit strings obtained from the first butterfly operation circuit 103 canbe used as parts of addresses and no useless area is generated in theROM, whereby the capacity of the ROM is reduced, resulting in furtherreduction in the circuit scale of the DCT processor.

Embodiment 3.

FIG. 11 illustrates a DCT processor according to a third embodiment ofthe present invention, and in the figure the same reference numerals asthose shown in FIG. 1 designate the same or corresponding parts. A DCTprocessor 300 according to this third embodiment is identical to the DCTprocessor 100 of the first embodiment except that no first butterflyoperation circuit is provided, and the output from the bit slice means102 is directly input to the ROM address generation circuit 104, andonly the inverse DCT operation is performed. In the DCT processor 300,since the control signal outputted from the control circuit 101 does notinclude the value indicating that either the DCT operation or theinverse DCT operation is to be performed, it is not necessary to includethe value indicating either the DCT operation or the inverse DCToperation in the ROM address generated by the ROM address generationcircuit 104 and, therefore, the ROM address is an 8-bit address obtainedby adding a header address indicating the value of N or M to thebit-by-bit-sliced input pixel data. Further, ROM20˜ROM27 are obtained byremoving the areas for storing the data used for the DCT operation, fromthe ROM0˜ROM7. The operation of this DCT processor 300 is identical tothe operation for inverse DCT of the DCT processor according to thefirst embodiment and, therefore, does not require repeated description.

Also in this third embodiment, as in the first embodiment, the butterflyoperation performed by the second butterfly operation circuit 106 iscontrolled by the control signal indicating the value of N or M, and theROM address generation circuit 104 generates the ROM addresscorresponding to the inverse DCT operation for the pixel data in N or Munit blocks, from the output of the bit slice circuit 102. Thereby, theinverse DCT operation for the pixel data in N×M unit blocks having thenumber of rows or columns being an arbitrary integer from 1 to 8 isperformed by a single DCT processor, resulting in a DCT processor havingreduced circuit scale.

Further, since the butterfly operation is used when the value of N or Mis a power of 2, the operational complexity can be reduced by utilizingFFT for the matrix operation, whereby the data capacity of the ROM forstoring the data to be the result of multiplication which is used forobtaining the result of inverse DCT operation can be reduced, resultingin a DCT processor having reduced circuit scale.

Further, the data can be mapped in the ROM with efficiency such that thebit strings obtained from the first butterfly operation circuit 103 canbe used as parts of addresses and no useless area is generated in theROM, whereby the capacity of the ROM is reduced, resulting in furtherreduction in the circuit scale of the DCT processor.

Embodiment 4.

FIG. 9 is a block diagram illustrating the structure of a DCT processoraccording to a fourth embodiment of the present invention. In thefigure, a bit slice circuit 112 is identical to the bit slice circuit102 of the first embodiment except that it receives 16-bit pixel data,and slices the pixel data for every 2 bits as a unit. Further, a firstbutterfly operation circuit 113 is identical to the first butterflyoperation circuit 103 of the first embodiment except that it performsbutterfly operation for every two-bit data and outputs the data in unitsof two bits. A ROM address generation circuit 114 generates 9-bitaddresses by adding header addresses to the addresses represented by therespective bits of the 2-bit data outputted from the first butterflyoperation circuit 113. These header addresses are the same as the dataused by the ROM address generation circuit 104 of the first embodiment,that is, bit strings such that all of the addresses obtained by addingthem to the output of the first butterfly operation circuit 113 arerearranged to continuous addresses. A RAC 115 comprises, like the RAC105 of the first embodiment, a plurality of.ROMs which hold, as tables,the partial products obtained by multiplying the bit strings obtainedfrom the bit-unit outputs of the first butterfly operation circuit 113by the coefficients of the matrix operations represented by formulae(5)˜(7), (9), (12)˜(14), (16), and (18)˜(23); and a plurality ofaccumulation circuits which accumulate the data outputted from therespective ROMs according to the addresses outputted from the ROMaddress generation circuit 114. However, since the bit slice circuit 112slices the pixel data in units of two bits, each ROM requires two tablesfor separately holding the partial products corresponding to the twoaddresses obtained from the two bits. Therefore, in place of theROM0˜ROM7 of the RAC 105 of the first embodiment, the RAC 115 comprisesROM0 a˜ROM7 a and ROM0 b˜ROM7 b of the same structure as the ROM0˜ROM7,respectively, and the corresponding ROMa and ROMb are arranged inparallel with each other. Although in this fourth embodiment the data ineach of the ROM0 a˜ROM7 a and the ROM0 b˜ROM7 b is 16-bit data, thenumber of bits of this data is not restricted thereto. The accumulationcircuits 52 a˜52 h receive, as 16-bit data, the outputs from ROM0 a andROM0 b, the outputs from ROM1 a and ROM1 b, the outputs from ROM2 a andROM2 b, the outputs from ROM3 a and ROM3 b, the outputs from ROM4 a andROM4 b, the outputs from ROM5 a and ROM5 b, the outputs from ROM6 a andROM6 b, and the outputs from ROM7 a and ROM7 b, respectively, and theaccumulation circuits output the results of accumulation as the resultof DCT operation when the DCT operation is performed or as the data tobe input to the second butterfly operation circuit 116 to obtain theresult of inverse DCT operation when the inverse DCT operation isperformed. The second butterfly operation circuit 116 is identical tothe second butterfly operation circuit 106 of the first embodimentexcept that it outputs 16-bit data.

In the DCT processor of this fourth embodiment, the pixel data, which isinput to the processor for each row or column of the image data of anN×M unit block, is sliced for every two bits, and the first butterflyoperation circuit 113 subjects the sliced two-bit data to the samebutterfly operation as that described for the first embodiment when theDCT operation is to be performed and the N or M as the number of pixeldata in the row or column to be input is a power of 2. In the casesother than mentioned above, no butterfly operation is performed. The ROMaddress generation circuit 114 adds header addresses to two pieces ofbit strings each comprising the data of each one bit of the plural 2-bitoutputs from the first butterfly operation circuit 113, therebygenerating two addresses, and outputs one of them to the ROM0 a˜ROM7 a,and the other to the ROM0 b˜ROM7 b. The ROM0 a˜ROM7 b and the ROM0b˜ROM7 b output the partial products to be used for the DCT operation orthe inverse DCT operation corresponding to the input addresses,respectively. Each of the accumulation circuits 52 a˜52 h accumulatesthe outputs from the corresponding two ROMs placed in parallel, andoutputs the result. When the inverse DCT is to be performed and the N orM as the number of pixel data in the input row or column is a power of2, the second butterfly operation circuit 116 performs the same inverseDCT operation as described for the first embodiment on the dataoutputted from the accumulation circuits 52 a˜52 h, and rearranges theresults of operation according to the order of the input pixel data. Inthe cases other than mentioned above, the circuit 116 performs nobutterfly operation, and rearranges the data outputted from theaccumulation circuits 52 a˜52 h according to the order of the inputpixel data.

According to this fourth embodiment, since the butterfly operation isperformed when the N or M as the number of pixel data in the input rowor column is a power of 2, FFT can be applied to the matrix operation,and the number of data used for obtaining the DCT operation and theinverse DCT operation, which data are to be stored in the ROM0 a˜ROM7 aand the ROM0 a˜ROM7 a, can be reduced, whereby the ROM capacity can bereduced as in the first embodiment.

Further, the header addresses are added to the addresses which areconstituted by the respective bits of the plural pieces of two-bit pixeldata outputted from the first butterfly operation circuit 113, such thatall of the addresses are arranged continuously. Therefore, mapping ofdata in the ROM is performed with efficiency, and the ROM capacity isreduced also when the input pixel data has 16 bits, as in the firstembodiment.

While in this fourth embodiment the DCT processor according to the firstembodiment is modified so that it receives 16-bit data, the DCTprocessors according to the second and third embodiment may be modifiedso that they receive 16-bit data. Also in this case, the same effects asthose provided by the fourth embodiment are achieved.

Further, while in the first to fourth embodiments the data input to thebit slice circuit has 8 bits or 16 bits, the number of bits of inputpixel data is not restricted thereto, and the same effects as thoseobtained by the aforementioned embodiments are achieved by adjusting theunit of bits to be sliced by the bit slice circuit or the number of ROMsincluded in the RAC.

In the DCT processors according to the first to fourth embodiments, whenthe value of N or M is other than 8, i.e., when it is larger than theupper limit, the operations of means to be unused, such as ROMs andaccumulation circuits, may be halted. Thereby, the power consumption bythe unnecessary means such as ROMs and accumulation circuits is reduced.

Further, while in the first to fourth embodiments ROMs are employed asmeans for outputting the result of multiplication, a combinationalcircuit which receives an address and outputs the result ofmultiplication corresponding to the address may be employed instead ofthe ROMs. Also in this case, the same effects as those provided by theaforementioned embodiments are achieved.

Further, while in the first to fourth embodiments the unit block ofinput image data is 8×8 pixels at the maximum, the maximum size of theunit block may be other than 8×8 pixels. Also in this case, the sameeffects as those provided by the aforementioned embodiments are achievedby increasing or decreasing the number of sets of ROMs and accumulationcircuits, and the size of each ROM.

Applicability in Industory

As described above, the DCT processor according to the present inventionis available as a DCT processor included in a video data codingapparatus or a video data decoding apparatus and, particularly, it issuited for a DCT processor included in an apparatus performing coding ordecoding based on MPEG (Moving Picture Coding Experts Group).

What is claimed is:
 1. A DCT processor performing one-dimensional DCToperation or one-dimensional inverse DCT operation on pixel data ofimage data in unit blocks each comprising N×M pixels (N,M: arbitraryintegers from 1 to 8), comprising: bit slice means for receiving thepixel data of the image data in each N×M unit block for each row orcolumn, and slicing, bit by bit, the respective pixel data constitutingthe input rows or columns, and outputting the sliced pixel data; controlmeans for outputting a control signal which includes the number of inputpixel data that is the number of pixel data constituting each input rowor column, and a value indicating that either the DCT operation or theinverse DCT operation is to be performed; first butterfly operationmeans for subjecting the output data from the bit slice means to thebutterfly operation and outputting the result of the butterfly operationin the case where the control signal outputted from the control meansindicates that the number of input pixel data is a power of 2 and thatthe DCT operation is to be performed, and in the cases other thanmentioned above, said first butterfly operation means performing nobutterfly operation and outputting the output data of the bit slicemeans as it is; address generation means for generating addresses on thebasis of bit strings obtained from the output data of the firstbutterfly operation means, and the number of input pixel data and thevalue indicating that either the DCT operation or the inverse DCToperation is to be performed, which are included in the control signal;operation means having eight sets of multiplication result output meansand accumulation means, said multiplication result output meansoutputting the results of multiplication to be used for obtaining theresults of the one-dimensional DCT and inverse DCT operations, inaccordance with the above-described addresses, and said accumulationmeans accumulating the output data from the multiplication result outputmeans and outputting the accumulated data; and second butterflyoperation means for subjecting the output data from the operation meansto the butterfly operation and outputting the result of the butterflyoperation after rearranging it according to the order of input pixeldata in the case where the control signal outputted from the controlmeans indicates that the number of input pixel data is a power of 2 andthat the inverse DCT operation is to be performed, and in the casesother than mentioned above, said second butterfly operation meansperforming no butterfly operation and outputting the output data of theoperation means after rearranging it according to the order of inputpixel data.
 2. A DCT processor as described in claim 1 wherein, on thebasis of the output data from the first butterfly operation means, andthe number of input pixel data, and the value indicating that either theDCT operation or the inverse DCT operation is to be performed, saidaddress generation means generates addresses as follows: when thecontrol signal indicates that the number of input pixel data is any of7, 6, 5, and 3, said address generation means generates an address byadding a header address of 2 bits, 3 bits, 4 bits, or 6 bits whichindicates the value of the number of input pixel data including thevalue indicating either the DCT operation or the inverse DCT operation,to a bit string of 7 bits, 6 bits, 5 bits, or 3 bits which isconstituted based on the output data from the first butterfly operationmeans, respectively; when the control signal indicates that the numberof input pixel data is any of 8, 4, and 2 and the DCT operation is to beperformed, said address generation means generates an address by addinga header address of 5 bits, 7 bits, or 8 bits which indicates the valueof the number of input pixel data including the value indicating thatthe DCT operation is to be performed, to a bit string of 4 bits, 2 bits,or 1 bit which is constituted based on the result of addition obtainedin the butterfly operation by the butterfly operation means, and to abit string of 4 bits, 2 bits, or 1 bit which is constituted based on theresult of subtraction obtained in the butterfly operation, respectively;when the control signal indicates that the number of input pixel data isany of 8, 4, and 2 and the inverse DCT operation is to be performed,said address generation means generates an address by adding a headeraddress of 5 bits, 7 bits, or 8 bits which indicates the value of thenumber of input pixel data including the value indicating that theinverse DCT operation is to be performed, to a bit string of 4 bits, 2bits, or 1 bit which is constituted based on the output of 8 bits, 4bits, or 2 bits from the first butterfly operation means, respectively;and said header addresses are bit strings which permit all of theaddresses obtained by adding the header addresses to the addresses basedon the output data from the first butterfly operation means, to becomecontinuous addresses.
 3. A DCT processor as described in claim 1,wherein said multiplication result output means outputs the results ofmultiplication as follows: when the control signal outputted from thecontrol means indicates that the number of input pixel data is a powerof 2 and the DCT operation is to be performed, said multiplicationresult output means outputs the result of multiplication with respect tothe bit strings obtained from the output data of the first butterflyoperation means, according to the DCT matrix operation using fastFourier transform; when the control signal outputted from the controlmeans indicates that the number of input pixel data is a value otherthan a power of 2 and the DCT operation is to be performed, saidmultiplication result output means outputs the result of multiplicationwith respect to the bit strings obtained from the output data of thefirst butterfly operation means, according to the DCT matrix operation;when the control signal outputted from the control means indicates thatthe number of input pixel data is a power of 2 and the inverse DCToperation is to be performed, said multiplication result output meansoutputs the result of multiplication with respect to the bit stringsobtained from the output data of the first butterfly operation means,according to the inverse DCT matrix operation using fast Fouriertransform; and when the control signal outputted from the control meansindicates that the number of input pixel data is a value other than apower of 2 and the inverse DCT operation is to be performed, saidmultiplication result output means outputs the result of multiplicationwith respect to the bit strings obtained from the output data of thefirst butterfly operation means, according to the inverse DCT matrixoperation.
 4. A DCT processor as described in claim 1 wherein, when thecontrol signal indicates that the number of input pixel data is a valueother than 8, the operation of means which is not used for the operationis halted.
 5. A DCT processor as described in claim 1 wherein: said bitslice means receives 16-bit data as each pixel data to be input, slicesthis 16-bit data for every two bits, and outputs the sliced data; andsaid operation means is provided with, as each of the multiplicationresult output means, two multiplication result output units placed inparallel with each other, each outputting the result of multiplication,and data obtained by adding the outputs of the two multiplication resultoutput units is accumulated by the corresponding accumulation means. 6.A DCT processor performing one-dimensional DCT operation on pixel dataof image data in unit blocks each comprising NXM pixels (N,M: arbitraryintegers not less than 1), comprising: bit slice means for receiving thepixel data of the image data in each N×M unit block for each row orcolumn, and slicing, bit by bit, the respective pixel data constitutingthe input rows or columns, and outputting the sliced pixel data; controlmeans for outputting a control signal which indicates the number ofinput pixel data that is the number of pixel data constituting eachinput row or column; butterfly operation means for performing butterflyoperation on the output data from the bit slice means and outputting theresult of the butterfly operation in the case where the control signaloutputted from the control means indicates that the number of inputpixel data is a power of 2, and in the cases other than mentioned above,said butterfly operation means performing no butterfly operation andoutputting the output data of the bit slice means as it is; addressgeneration means for generating addresses by using bit strings obtainedfrom the output data of the first butterfly operation means, and thenumber of input pixel data included in the control signal; operationmeans having plural sets of multiplication result output means andaccumulation circuits, as many as the maximum value of the number ofinput pixel data, said multiplication result output means outputting theresults of multiplication to be used for obtaining the result ofone-dimensional DCT operation, in accordance with the above-describedaddresses, and said accumulation circuits accumulating the results ofmultiplication outputted from the respective multiplication resultoutput means and outputting the accumulated results; and output meansfor rearranging the output data of the operation means according to theorder of input pixel data, and outputting the rearranged data as theresult of one-dimensional DCT operation.
 7. A DCT processor as describedin claim 6 wherein, on the basis of the output data from the firstbutterfly operation means and the number of input pixel data, saidaddress generation means generates addresses as follows: when thecontrol signal indicates that the number of input pixel data is a valueother than a power of 2, said address generation means generates anaddress by adding a header address for indicating the number of inputpixel data, to an address having the number of bits equal to the numberof input pixel data, which is constituted based on the output data fromthe first butterfly operation means; when the control signal indicatesthat the number of input pixel data is a power of 2, said addressgeneration means generates an address by adding a header address forindicating the number of input pixel data, to a bit string having thenumber of bits equal to half of the number of input pixel data, which isconstituted based on the result of the addition obtained in thebutterfly operation by the first butterfly operation means, and to a bitstring having the number of bits equal to half of the number of inputpixel data, which is constituted based on the result of the subtractionobtained in the butterfly operation; and said header addresses are bitstrings which permit all of the addresses obtained by adding the headeraddresses to the addresses based on the output data from the firstbutterfly operation means, to become continuous addresses and have thenumber of bits equal to the maximum value of the number of input pixeldata.
 8. A DCT processor as described in claim 7, wherein: the unitblock of the image data to be input to the bit slice means is a unitblock each comprising N×M pixels (N,M: arbitrary integers from 1 to 8);and said operation means includes eight sets of multiplication resultoutput means and accumulation means, which is equal to the maximum valueof the number of input pixel data.
 9. A DCT processor as described inclaim 6, wherein said butterfly operation means performs butterflyoperation for outputting the values obtained by sequentially adding andsubtracting the pixel data, which have been input for each row or columnto the bit slice means and sliced bit by bit to be output, starting fromthe both ends of the input row or column toward the inside.
 10. A DCTprocessor as described in claim 9, wherein: the unit block of the imagedata to be input to the bit slice means is a unit block each comprisingN×M pixels (N,M: arbitrary integers from 1 to 8); and said operationmeans includes eight sets of multiplication result output means andaccumulation means, which is equal to the maximum value of the number ofinput pixel data.
 11. A DCT processor as described in claim 6, whereinsaid multiplication result output means outputs the result ofmultiplication as follows: when the control signal outputted from thecontrol means indicates that the number of input pixel data is a powerof 2, said multiplication result output means outputs the result ofmultiplication with respect to the bit strings obtained from the outputdata of the first butterfly operation means according to the DCT matrixoperation using fast Fourier transform; and when the control signalindicates that the number of input pixel data is a value other than apower of 2, said multiplication result output means outputs the resultof multiplication with respect to the bit strings obtained from theoutput data of the first butterfly operation means according to the DCTmatrix operation.
 12. A DCT processor as described in claim 11, wherein:the unit block of the image data to be input to the bit slice means is aunit block each comprising N×M pixels (N,M: arbitrary integers from 1 to8); and said operation means includes eight sets of multiplicationresult output means and accumulation means, which is equal to themaximum value of the number of input pixel data.
 13. A DCT processor asdescribed in claim 6 wherein: the unit block of the image data to beinput to the bit slice means is a unit block each comprising N×M pixels(N,M: arbitrary integers from 1 to 8); and said operation means includeseight sets of multiplication result output means and accumulation means,which is equal to the maximum value of the number of input pixel data.14. A DCT processor as described in claim 6 wherein, when the controlsignal indicates that the number of input pixel data is equal to a valueother than the maximum value of the number of input pixel data, theoperation of means to be unused is halted.
 15. A DCT processor asdescribed in claim 6, wherein: said bit slice means receives 16-bit dataas each pixel data to be input, slices this 16-bit data for every twobits, and outputs the sliced data; and said operation means is providedwith, as each of the multiplication result output means, twomultiplication result output units placed in parallel with each other,each outputting the result of multiplication, and data obtained byadding the outputs of the two multiplication result output units isaccumulated by the corresponding accumulation means.
 16. A DCT processorperforming one-dimensional inverse DCT operation on pixel data of imagedata in unit blocks each comprising N×M pixels (N,M: arbitrary integersnot less than 1), comprising: bit slice means for receiving the pixeldata of the image data in each N×M unit block for each row or column,and slicing, bit by bit, the respective pixel data constituting theinput rows or columns, and outputting the sliced pixel data; controlmeans for outputting a control signal which includes the number of inputpixel data that is the number of pixel data constituting each input rowor column; address generation means for generating addresses using bitstrings obtained from the output data of the bit slice means, and thenumber of input pixel data included in the control signal; operationmeans having plural sets of multiplication result output means andaccumulation circuits, as many as the maximum value of the number ofinput pixel data, said multiplication result output means outputting theresults of multiplication to be used for obtaining the result ofone-dimensional DCT operation in accordance with the above-describedaddresses, and said accumulation circuits accumulating the results ofmultiplication outputted from the respective multiplication resultoutput means and outputting the accumulated results; and butterflyoperation means for performing butterfly operation on the output datafrom the operation means and outputting the result of the butterflyoperation after rearranging it according to the order of input pixeldata in the case where the control signal outputted from the controlmeans indicates that the number of input pixel data is a power of 2, andin the cases other than mentioned above, said butterfly operation meansperforming no butterfly operation and outputting the output data of theoperation means after rearranging it according to the order of inputpixel data.
 17. A DCT processor as described in claim 16 wherein, on thebasis of the output data from the bit slice means and the number ofinput pixel data, said address generation means generates addresses asfollows: when the control signal indicates that the number of inputpixel data is a value other than a power of 2, said address generationmeans generates an address by adding a header address for indicating thenumber of input pixel data, to an address having the number of bitsequal to the number of input pixel data, which is constituted based onthe output data of the bit slice means; when the control signalindicates that the number of input pixel data is a power of 2, saidaddress generation means generates an address by adding a header addressfor indicating the number of input pixel data, to a bit string havingthe number of bits equal to half of the number of input pixel data,which is constituted based on the output data from the bit slice means;and said header addresses are bit strings which permit all of theaddresses obtained by adding the header addresses to the addresses basedon the output data of the bit slice means to become continuous addressesand have the number of bits equal to the maximum value of the number ofinput pixel data constituting the input row or column.
 18. A DCTprocessor as described in claim 16, wherein said butterfly operationmeans performs butterfly operation for outputting the value obtained byaddition and the value obtained by subtraction, which addition andsubtraction are performed between the value obtained by accumulating theresult of multiplication based on the odd-numbered pixel data amongstthe pixel data input for each row or column, and the value obtained byaccumulating the result of multiplication based on the even-numberedpixel data.
 19. A DCT processor as described in claim 16, wherein saidmultiplication result output means outputs the result of multiplicationas follows: when the control signal outputted from the control meansindicates that the number of input pixel data is a power of 2, saidmultiplication result output means outputs the result of multiplicationwith respect to the bit strings obtained from the output data of thefirst butterfly operation means according to the inverse DCT matrixoperation using fast Fourier transform; and when the control signalindicates that the number of input pixel data is a value other than apower of 2, said multiplication result output means outputs the resultof multiplication with respect to the bit strings obtained from theoutput data of the first butterfly operation means according to theinverse DCT matrix operation.
 20. A DCT processor as described in claim16, wherein: the unit block of the image data to be input to the bitslice means is a unit block each comprising N×M pixels (N,M: arbitraryintegers from 1 to 8); and said operation means includes eight sets ofmultiplication result output means and accumulation means, which isequal to the maximum value of the number of input pixel data.
 21. A DCTprocessor as described in claim 16, wherein: said bit slice meansreceives 16-bit data as each pixel data to be input, slices this 16-bitdata for every two bits, and outputs the sliced data; and said operationmeans is provided with, as each of the multiplication result outputmeans, two multiplication result output units placed in parallel witheach other, each outputting the result of multiplication, and dataobtained by adding the outputs of the two multiplication result outputunits is accumulated by the corresponding accumulation means.
 22. A DCTprocessor as described in claim 16, wherein, when the control signalindicates that the number of input pixel data is equal to a value otherthan the maximum value of the number of input pixel data, the operationof means to be unused is halted.